Replication Code for "Age at Immigrant Arrival and Career Mobility: Evidence from Vietnamese Refugee Migration and the Amerasian Homecoming Act" 
JPE Microeconomics
Census Bureau Data Component

Sari Kerr, Wellesley College
William Kerr, Harvard University
Kendall Smith, London Business School

Disclaimer for program disclosure: Any views expressed are those of the authors and not those of the U.S. Census Bureau. The Census Bureau has reviewed this data product to ensure appropriate access, use, and disclosure avoidance protection of the confidential source data used to produce this product. This research was performed at a Federal Statistical Research Data Center under FSRDC Project Number 1571. (CBDRB-FY25-P1571-R12036). The results were disclosed as CBDRB-FY23-P1571-R10504 and CBDRB-FY24-P1571-R11724.

*** OVERVIEW
The code in this replication package constructs the analysis file from confidential Census Bureau dataset outlined below using Stata. The code then prepares Tables 2 and 5, along with Appendix Tables 9-11. The replicator should expect the code to run less than one day once the Census Bureau data are prepared.

*** DATA AVAILABILITY AND PROVENANCE
All the results in the paper use confidential microdata from the U.S. Census Bureau. Data must be accessed through the Census Bureau via an approved project. No data can be made publicly available. We certify that the authors of the manuscript have legitimate access to and permission to use the data used in this manuscript.

To gain access to the Census microdata, follow the directions here on how to write a proposal for access to the data via a Federal Statistical Research Data Center: https://www.census.gov/topics/research/guidance/restricted-use-microdata/standard-application-process.html. 

You must request the following datasets in your proposal: 

Longitudinal Employer Household Dynamics (LEHD) files (2000-2014):
	Job History File (JHF)
	Employer Characteristics File, Establishment (ECF SEIN)
	Individual Characteristics File (ICF US)
Decennial Census (2000)
American Community Survey (2001-2014)

For more information about the LEHD data see https://lehd.ces.census.gov/data/lehd-snapshot-doc/latest/sections/introduction.html. 

States used in our project: Arkansas, Arizona, California, Colorado, DC, Delaware, Iowa, Illinois, Indiana, Kansas, Maine, Maryland, Montana, Nebraska, New Mexico, Nevada, North Dakota, Oklahoma, Pennsylvania, Tennessee, Texas, Virginia, Washington.

*** COMPUTATIONAL REQUIREMENTS
Computational resources are determined and provided by the Census Bureau. SAS is typically used to extract the base data, and STATA is used for analytical purposes. The initial LEHD data build from its base files can require more than a day of computational time and >250gb of memory space. The subsequent analysis required for this paper can be run in one day.

*** DESCRIPTION OF CODE
The code does not reproduce any numbers or figures in the paper without access the original Census Bureau datasets. cenacs_men_pik.dta and cenacs_women_pik.dta are input files that are unique at the person level. Each observation is a person (PIK) from the 2000 Decennial Census or the 2001-2014 ACS. Due to Census requirements at the time of our project's approval, the data are deduplicated such that only one source dataset was used for each person.

The first part of the code (01a_get_cen_acs_viet.do) limits those data sets to individuals born in Vietnam in 1962-1975, who migrated into the U.S. in 1989-1994. The code then retrieves from the raw LEHD EHF data the employment and earnings histories for those individuals working in one of our 23 LEHD states. The resulting data set is unique by person (PIK) - year, but also retains the person's main job employer ID (SEIN) each year. The last part of the code uses the raw LEHD ECF data to develop characteristics of the employers, at the SEIN - year level.

The second part of the code (01b_sein_year_vars_to_merge.do) creates characteristics of the main employer (SEIN) on an annual basis, to be merged into the person-year data. Input data are the raw LEHD EHF and ICF files, along with the main employer IDs (SEIN) from the first code.

The third code (02_final_data_and_analysis.do) combines the various datasets and runs the analyses shown in the paper. 

Since some of the code had to be redacted to disclose, programs have been adjusted to not reveal anything about the data or otherwise violate the disclosure rules.